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Abstract 

Conditional Poisson models have been used to analyze vaccine safety data from self-controlled 
case series (SCCS) design. In this paper, we derived the likelihood function of fixed effects 
models in analyzing SCCS data and showed that the likelihoods from fixed effects models and 
conditional Poisson models were proportional. Thus, the maximum likelihood estimates (MLEs) 
of time-varying variables including vaccination effect from fixed effects model and conditional 
Poisson model were equal. We performed a simulation study to compare empirical type I errors, 
means and standard errors of vaccination effect coefficient, and empirical powers among 
conditional Poisson models, fixed effects models, and generalized estimating equations (GEE), 
which has been commonly used for analyzing longitudinal data. Simulation study showed that 
both fixed effect models and conditional Poisson models generated the same estimates and 
standard errors for time-varying variables while GEE approach produced different results for some 
data sets. We also analyzed SCCS data from a vaccine safety study examining the association 
between measles mumps-rubella (MMR) vaccination and idiopathic thrombocytopenic purpura 
(ITP). In analyzing MMR-ITP data, likelihood-based statistical tests were employed to test the 
impact of time -invariant variable on vaccination effect. In addition a complex semi-parametric 
model was fitted by simply treating unique event days as indicator variables in the fixed effects 
model. We conclude that theoretically fixed effects models provide identical MLEs as conditional 
Poisson models. Because fixed effect models are likelihood based, they have potentials to address 
methodological issues in vaccine safety studies such as how to identify optimal risk window and 
how to analyze SCCS data with misclassification of adverse events 
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Introduction 

The association between particular adverse events following immunization (AEFI) and 
receipt of a specific vaccine has been studied using large electronically-linked health care 
utilization databases [1-8]. As an example, the Vaccine Safety Datalink (VSD) project uses 
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electronic data, including vaccination and diagnosis data, on 8.8 million managed care 
enrollees to study not only common AEFIs (eg, fever, soreness), but also rare AEFIs (eg, 
death, seizures, idiopathic thrombocytopenic purpura (ITP)). 

Since individuals in observational settings such as the VSD are not randomly chosen to be 
vaccinated, vaccinated and unvaccinated individuals may differ greatly and possibly in ways 
related to the outcome of interest. This confounding bias, if not accounted for properly, may 
invalidate analytic results of cohort studies. In addition, traditional study designs such as 
matched cohort and case-control designs may not even be feasible for studying vaccine 
safety because 1) the coverage of some vaccines is nearly 100%, so there are not enough 
unvaccinated individuals to use for the control group; and 2) data are not collected in safety 
surveillance system for those who did not experience/report an adverse event (eg, the 
Vaccine Adverse Event Reporting System). For these reasons, a method known as the self- 
controlled case series (SCCS) has been developed and widely used for vaccine safety studies 
[7-11]. The SCCS is a case-only method in which a subject's follow-up period is partitioned 
into risk and control intervals. SCCS data are typically analyzed using conditional Poisson 
models by conditioning on the marginal total number of adverse events that occurred in an 
individual. The resulting likelihood kernel does not contain the individual-specific random 
coefficients that explain each individual's baseline risk for the event count of interest 
(Farrington, 1995). By making within-person comparisons of incidence rates between 
vaccine exposed and unexposed time intervals, conditional Poisson models implicitly adjust 
for all time-invariant individual-level risk factors and potential confounders (measured and 
not measured). 

Several statistical software packages (i.e. STATA, SAS, GENSTAT) can be used for the 
Poisson process to analyze SCCS data [10,12,13]. Although not explicitly mentioned, fixed 
effects models were used to analyze SCCS data in a tutorial paper by Whitaker et al. [10]. 
SCCS data have a longitudinal data structure in which multiple observations are those 
intervals defined by vaccination exposure status and time-varying covariates on an 
individual. The number of observations depends on the number of risk levels and levels of 
time-varying covariates. Fixed effects models have been used for analyzing data with 
multiple observations on an individual in cohort studies, randomized or observational [14]. 
They do not assume distributions for the individual-specific random coefficients. The 
random coefficients are instead estimated from the data as fixed effects in the fixed effects 
models. 

In this paper, we demonstrate that the likelihoods from the fixed effects model and the 
conditional Poisson model are proportional when analyzing SCCS data. We show that by 
simulation, fixed effects models and conditional Poisson models typically used for SCCS 
data analysis are equivalent in estimating vaccination effects. We also compared these two 
approaches to a generalized estimating equation approach (GEE) [15,16], which is widely 
used for longitudinal data analysis. We furthermore demonstrate that fixed effects models 
have numerous advantages over the conditional Poisson models when analyzing a SCCS 
example. 

Methods 

Conditional Poisson models for self-controlled case series design 

If a Poisson process is assumed for the unrestricted population in a cohort design, the 
likelihood function, conditioning on each individual's total number of adverse events is a 
multinomial kernel, containing only the parameters |3s for time-varying covariates including 
the exposure effect Pi. Since the individual effects are canceled out in this analysis, it is not 
to be expected that any assumption of their distribution would influence the inference of )3s. 
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The multinomial term is zero for all individuals with no adverse event. Thus, individuals 
without adverse events do not contribute in parameter estimation. 

Suppose we have a sample of n individuals with adverse events. Let R = exp(Pi) denote the 
incidence rate ratio for the vaccination effect where Pi is the model coefficient for the 
vaccination effect. Farrington (1995) proposed a conditional Poisson model that uses only 
cases, vaccinated and unvaccinated. The conditional Poisson likelihood kernel is the product 
of the likelihood kernel across subjects which is of the following form for the ; th subject 



Vij 



3 



^^exp(X ii 5) 



(l) 



Here, Xfj is the row vector of time-varying covariates including indicator variables for age 
effects and vaccination effect, P is a column vector of corresponding coefficients including 
Pi, tjj is the person-time (in days) for subject i in interval j, and y;j is the corresponding 
number of adverse events for subject i in interval j which is binary when the adverse event is 
rare. The conditional Poisson regression model (1) allows for more than one risk levels in 
the risk window [10]. 

Fixed effects models for SCCS data: parametric and semi-parametric models 

SCCS data have longitudinal data structures where analytic units are intervals of each 
subject. Each interval represents a unique combination of time-varying covariates. Although 
each individual has at least one adverse event, some intervals of an individual do not have an 
adverse event. In this paper it is assumed that the observations in an SCCS data set follow a 
Poisson distribution in which the individual effects may be represented by a random 
coefficient. As a consequence, the likelihood function from the fixed effects model is the 
full likelihood function for SCCS data. Let ^ represent the individual-specific random 
coefficient for individual i who experienced at least one adverse event during the follow-up 
period Tj, and ^ does not assume any defined distribution such as a normal distribution. 

We consider fitting the fixed effects models for SCCS data, 



t/ ij ~Poisson('Ui j ) (2) 

where Uij=tjj m; mj=exp(a+^i) and Xjj =exp(X;jP), a represents the intercept coefficient 
for the overall case sample, E,j is the unknown individual-specific random effect, Xy and P 
have the same meanings as defined previously for model (1), and exp(a) is the baseline 
incidence rate for the adverse events for each unit of unvaccinated period of time (eg, each 
day) in the case sample. Similar to the likelihood function for cohort data [14], the 
likelihood function of the fixed effects model for individual i when using case-only data is, 



Vij'. 



where J; is the number of observations on individual /. Substituting u;;= t;; m; A,;:, 
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Taking the natural logarithm of Lpg(n, differentiating with respect to nij and setting to zero 
we have, 



Substituting into Lpmy the likelihood function for the z™ individual is 
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Substitute X,y = exp(X,yP) into L FE ^ 
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We show that the likelihood function from a fixed effects model as in (3) is proportional to 
that from a conditional Poisson model as in (1). As a result, the maximum likelihood 
estimates of (3s are same for the two models. 

Age is an important confounder in vaccine safety studies because of its possible association 
with both the probability of vaccination and the occurrence of adverse events. A model with 
a defined age effect form such as the age groups used in the MMR-ITP example is called a 
parametric model while a model without defining age effect form is called a semi-parametric 
model [10,12]. Fixed effects models as in (3) can be easily modified to fit the semi- 
parametric model in which the form of age effects is not specified. Each event age is treated 
as a dummy variable, so essentially the length of an age group is only one day, and the 
number of age effect levels is the number of unique days of adverse events. 
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Results 

Simulation study and results 

We performed simulations to evaluate the performance of a fixed effects model in its ability 
to analyze SCCS data. We also compared estimates from a fixed effects model and 
conditional Poisson model to those obtained using generalized estimating equations (GEE) 
[15,16], which has been widely used to analyze longitudinal data including normally 
distributed data, binary and Poisson data. GEE is not a likelihood-based approach and is 
used to estimate population average parameters and correlation among observations on the 
same individual is accounted for. 

Simulation — Dependent Poisson data were simulated according to model (2) with 
independent and identically distributed N (0, a 2 ) or a uniform distribution (0, 1). a was 
chosen to be 1 and 1.5. Each of 100 individuals was assumed to have a follow-up period of 
365 days, consisting of both risk and control periods. Typically, there are three periods for 
each subject: the control period before vaccination, the risk period after vaccination, and the 
control period after the risk period. Vaccination times were assumed to follow a normal 
distribution with a mean of 140 days and a standard deviation of 42 days. The values of the 
overall intercept coefficient, a, were set to be -5, -6, -7, and -8 to achieve baseline 
incidence rates ranging from 0.00034 to 0.00674 per day in the simulated data. Time- 
varying covariates included in the model were age effects and an indicator variable for risk 
and control periods. The chosen coefficients for age effects were 0.3 for days 91-270 and 
0.1 for days 271-365 with days 1-90 as the reference group. The coefficient for vaccination 
effect, Pi, was chosen to be 0.69, 1.39, and 1.79, which represents incidence rate ratios of 2, 
4 and 6, respectively. 1,000 datasets were simulated and analyzed for each combination of 
parameters. 

Evaluation measures — We analyzed each simulated dataset with three methods, a 
conditional Poisson model, a fixed effects model and GEE. Means of vaccination effects and 
their standard errors for each of the three methods in each simulated data setting were 
calculated. We also calculated the means of absolute difference between conditional Poisson 
models and GEE, and reported the maximum absolute difference. We conducted this last 
step because it is possible that the means of the vaccination effect estimates are same 
between the conditional Poisson models and GEE, but vaccination effect estimates may 
differ in each simulated data set. Empirical powers were calculated as percent of data sets 
with significant vaccination effect (p-value<0.05) under the alternative hypothesis. Type I 
error rates were also reported for data simulated under the null hypothesis, Pi=0. 

Simulation results — Table 1 shows that conditional Poisson models, fixed effects 
models, and GEE have acceptable type I error rates (i.e. about 5%) for the parameters 
examined when the individual-specific random errors are simulated from a normal 
distribution. The GEE approach produces slightly different estimates and their standard 
errors for vaccination effect than conditional Poisson models and fixed effects models 
(Table 2). Empirical powers are also similar among the three approaches under the 
alternative hypothesis Pi ^ 0. Although there is little difference on average in parameter 
estimation between GEE and conditional Poisson models, the means of absolute difference 
between conditional Poisson models and GEE reveal that they can produce very different 
results for some simulated data sets. For example, the maximum absolute difference in 
estimated coefficient for the vaccine effects is 0.339 in a simulated data set while the mean 
of the absolute difference is only 0.0127 (Table 3). Similar results are observed for Pi=1.39 
and 1 .79 (data not shown). When the individual-specific random errors are simulated from a 
uniform distribution, similar results are observed as well (data not shown). 
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An example 

To demonstrate how a fixed effects model approach can be used for SCCS data, we re- 
analyzed the data from a SCCS study examining the association between MMR vaccination 
and ITP among a US pediatric population [7]. They excluded the 42 -day healthy period 
immediately preceding MMR vaccination and used a pre-specified 42 -day risk window after 
MMR vaccination, which has subsequently been confirmed by a data-driven approach to be 
an optimal choice of risk window length [17]. They found an incidence rate ratio of 7.06 for 
those vaccinated at an age between 366 and 690 days old. In this paper, for the purpose of 
demonstrating the use of fixed effects models in analyzing SCCS data, we defined six age 
groups as in Xu et al. [17]: 366^126 days, 427-487 days, 488-548 days, 549-609 days, 
610-670 days, and 671-730 days (the reference group). Days outside of a 42 -day post- 
vaccination risk window were considered to be the control window. We also included all 
subjects with follow-up of 366-730 days regardless of vaccination status and without 
excluding the healthy period before MMR vaccination. 

The first step in applying a fixed effects model to SCCS data involves manipulating the data 
to a more usable format. Specifically, the analytic data can be expanded as described in Xu 
et al. [13]. Briefly, the follow-up period for each individual in the SCCS sample was 
expanded into daily observations with an exposure status indicator (risk versus control), 
indicators for age effects (defined age effects group for parametric model or undefined age 
effects form for semi-parametric model), an indicator variable for adverse events, and any 
other time-invariant and time-varying covariates. Data in this expanded form facilitate 
simple calculations of the person time (in days) and number of adverse events by exposure 
status, age groups, and any other time-varying covariates for each individual. The result is a 
data set with multiple observations for each individual, which can be analyzed using a fixed 
effects model with most statistical software packages (eg, PROC GENMOD in SAS). 

Table 4 shows that conditional Poisson models and fixed effects models give the same 
estimates and standard errors for both the vaccination effect and the age effects when using 
the parametric method for age effects. Estimates and standard errors obtained using a GEE 
approach differ only slightly. For the semi-parametric method, the conditional Poisson 
model and fixed effects model yield the same estimates for the vaccination effect (1.97) and 
standard error (0.37). Estimates using GEE are not available when using the semi-parametric 
model because the individual-specific coefficients cannot be modeled as both fixed and 
random effects in the same model. 

To show one of many potential utilities of using a fixed effects model for SCCS data, we 
also compared the risk of ITP after MMR vaccination between male and female patients. If 
conditional Poisson models are employed, gender subgroup analyses must be carried out to 
make this comparison, and a direct statistical comparison (test) is not readily available. 
However, if a fixed effects model is used, we can statistically compare the risk of ITP 
between genders by fitting a model with two interactions 1) between exposure and gender 
and 2) between age groups and gender (Table 5). The test statistic and p-value for this 
comparison is readily available in results of standard statistical software (eg, using estimate 
and contrast statements in PROC GENMOD in SAS). Additionally, the fixed effects model 
can accommodate the same age effects for both male and female patients by replacing the 
interaction between age groups and gender with age groups only (Table 5). This model 
produced smaller gender difference in risks of ITP after MMR vaccination than the model 
with different age coefficients for male and female patients although the conclusion is same 
(p-value = 0.76 versus 0.31). In this way, the fixed effects modeling approach offers 
considerable additional flexibility beyond what a conditional Poisson approach can provide. 
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Discussion 

Fixed effects models were first implicitly recommended for analyzing SCCS data by 
Whitaker et al. [10]. They included a factor for each individual to ensure the fitted individual 
totals equal the observed values but did not study the relation between the conditional 
Poisson models and fixed effects models. In this paper, we demonstrate that theoretically 
and by simulation the parameters estimates and their standard errors are equivalent between 
the conditional Poisson models and fixed effects models. On average, GEE approach 
produces very similar estimates and their standard errors for vaccination effects although for 
some simulated data sets it can produce very different results than the conditional Poisson 
models. We do not recommend using GEE approach to analyze SCCS data due to possible 
bias in estimation. 

There are numerous of advantages for statistical analysts in vaccine safety research to use 
fixed effects models to analyze SCCS data. First, likelihood-based statistical tests may be 
employed. For example, a likelihood ratio test was used to investigate the effect of gender 
on the association between oral polio vaccine and intussusception [10]. Second, statistical 
analysts may find fitting fixed effects models for SCCS data more intuitive than fitting 
conditional Poisson models in standard statistical software packages because they are 
already familiar with procedures for fitting Poisson models (eg, PROC GENMOD in SAS). 
Third, fitting a complex semi-parametric model is achieved by simply treating event days as 
indicator variables in preparing summary data for person time and number of events and 
later including them in the fixed effects model. 

Although we examined a wide variety of simulated data settings, we only studied normal 
and uniform distributions for the individual-specific random effects in simulations. Results 
should remain equivalent for conditional Poisson models and fixed effects models for 
different distributions since their likelihoods are proportional. Inconsistent estimation may 
be a potential issue when fitting fixed effects models if the number of random effects 
approaches infinity. However, in SCCS vaccine safety studies, the number of random effects 
to be estimated depends on the number of individuals with adverse events, which is usually 
small. Thus, inconsistency should not often be a problem when analyzing SCCS data. For 
relatively large number of cases, the fixed effects models may take relatively long time to 
fit. In this case Whitaker et al. [10] suggested the use of absorbing factors to fit the models 
efficiently without estimating individual-specific random effects explicitly. 

In summary, for vaccine safety studies involving SCCS data, we recommend using a fixed 
effects modeling framework to estimate incidence rate ratios as this approach offers many 
advantages compared to the traditionally-used conditional Poisson model. More broadly, 
existing longitudinal data analysis tools may provide opportunities to better address research 
questions that arise in SCCS vaccination studies such as how to use likelihood-based 
statistical method to identify optimal risk windows for a given SCCS data set [17] how to 
analyze SCCS data with misclassification of adverse events when adverse events are 
partially chart reviewed due to resource limitation [18-20]. 
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Table 1 

Type I error rates from 1000 simulations under the null hypothesis, Pi=0, when individual-specific random 
coefficients have a normal distribution. 



Simulation parameters 


Average number of cases 


Type I error rates (%) 


a 


a 




CP<VFE 6 


GEE C 


-5 


1 


488 


4.0 


4.3 




1.5 


896 


4.3 


4.0 


-6 


1 


179 


3.7 


4.0 




1.5 


336 


5.0 


5.1 


-7 


1 


66 


4.4 


4.3 




1.5 


124 


2.8 


2.5 


-8 


1 


25 


6.2 


6.0 




1.5 


46 


4.1 


4.0 



a 

CP, conditional Poisson model. 

b 

FE, fixed effects model. 

GEE, generalized estimation equations. 
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Table 3 

Mean of absolute difference of vaccination effect coefficient from 1000 simulations with true (3i=0.693 when 
individual-specific random effects have a normal distribution. 



Simulation parameters 


Mean of absolute difference (maximum) 


a 


a 


CP^/FE* versus GEE C 


-5 


1 


0.0081 (0.0671) 




1.5 


0.0086 (0.0873) 


-6 


1 


0.0064(0.1576) 




1.5 


0.0080 (0.1931) 


-7 


1 


0.0060(0.1175) 




1.5 


0.0099 (0.2884) 


-8 


1 


0.0068 (0.2174) 




1.5 


0.0127 (0.3393) 



a 

CP, conditional Poisson model. 

b 

FE, fixed effects model. 

GEE, generalized estimation equations. 
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Table 4 

Estimated coefficients (standard error) using parametric method from US MMR-ITP data. 





CP«/FE* 


GEE C 


Vaccination 


2.00 (0.36) 


2.04 (0.38) 


366-426 


-1.50 (0.55) 


-1.55 (0.61) 


Age (days) 






427-487 


-0.39 (0.44) 


-0.39 (0.42) 


488-548 


-0.30 (0.42) 


-0.31 (0.44) 


549-609 


-0.27 (0.42) 


-0.31 (0.43) 


610-670 


-0.14(0.41) 


-0.16 (0.42) 



CP, conditional Poisson model. 

b 

FE, fixed effects model. 

c ..... 
GEE, generalized estimation equations. 
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Table 5 

Estimated coefficients (standard error) by gender using parametric method from US MMR-ITP data. 





Subgroup CP"/FE* with gender*age and exposure*gender 


FE with exposure*gender and age 


Female 


Male 


Female 


Male 


Vaccination 


2.64 (0.64) 


1.85 (0.45) 


1.90 (0.51) 


2.01 (0.44) 


Age (days) 










366-426 


-3.11 (1.00) 


-0.42 (0.75) 


-1.49 (0.56) 




427-487 


-0.84 (0.60) 


0.15 (0.68) 


-0.39 (0.44) 




488-548 


-1.37 (0.66) 


0.62 (0.63) 


-0.28 (0.42) 




549-609 


-0.40 (0.53) 


-0.06 (0.69) 


-0.28 (0.42) 




610-670 


-0.58 (0.57) 


-0.43 (0.64) 


-0.14(0.41) 





a 

CP, conditional Poisson model. 

b 

FE, fixed effects model. 
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